4.7 The F-test for Comparing Multiple Classifiers

If the data had been measured variables that appeared normally distributed, instead of a collection of 1’s and 0’s, the F -test would be almost automatically applied as the appropriate method. (Cochran 1950)

「データが1と0の集合の代わりに正規分布している変数として計測される場合、F検定が適切な手法としてほとんど自動的に適用されるだろう」

F検定は1と0の集合のケースでも近似として提供する、と続けている

The method of using the F-test for comparing two classifiers in this section is somewhat loosely based on Looney

「このセクションで2つの分類器を比較するのに使うF検定の手法は、いくらか大雑把にLooneyの手法に基づく」

LooneyはF+検定と呼ばれる補正したバージョンを薦めている

F検定

帰無仮説：分類accuracyに違いはない

Let {C1, . . . , CM } be a set of classifiers which have all been tested on the same dataset

M個の分類器に違いがない（帰無仮説が採択される）ならば、統計量Fは、自由度(M-1)と(M-1)*nのF分布に従う

nはテストセットのサンプル数

ACC_avg：M個のモデルのaccuracyの平均

The sum of squares of the classifiers：SSA

「分類器の二乗和」

G_j：nサンプルのうち、分類器jによって正しく分類された割合

The sum of squares for the objects：SSB

「オブジェクト＝テストサンプルに関する二乗和」

M_j：M個の分類器のうち、テストサンプルx_jを正しく分類した分類器の数（最大でM）

the total sum of squares：SST

the sum of squares for the classification–object interaction：SSAB

「分類とオブジェクトのinteraction（相互作用）の二乗和」

To compute the F statistic, we next compute the mean SSA and mean SSAB values:

mean SSAがMSA

mean SSABがMSAB

F値は MSA/MSAB

After computing the F-value, we can then look up the p-value from an F-distribution table for the corresponding degrees of freedom or obtain it computationally from a cumulative F-distribution function.

「F値を求めた後、対応する自由度のF分布表からp値を引く、または累積F分布関数からp値を計算機的に得る」

p値と有意水準を比較して、帰無仮説を検定

帰無仮説が棄却されたらペア単位の複数の事後検定

Bonferroni修正付きのMcNemar検定

F検定の実装：（TODO 触る）ftest: F-test for classifier comparisons